GeoAlign: Interpolating Aggregates over Unaligned Partitions
نویسندگان
چکیده
Answering crucial socioeconomic questions often requires combining and comparing data across two or more independently collected data sets. However, these data sets are often reported as aggregates over data collection units, such as geographical units, which may differ across data sets. Examples of geographical units include county, zip code, school district, etc., and as such, they can be incongruent. To be able to compare these data, it is necessary to realign the aggregates from the source units to a set of target spatially congruent geographical units. Existing intelligent areal interpolation/realignment methods, however, make strong assumptions about the spatial properties of the attribute of interest based on domain knowledge of its distribution. A more practical approach is to use available reference data sources to aid in this alignment. The selection of the references is vital to the quality of prediction. In this paper, we devise GeoAlign, a novel multi-reference crosswalk algorithm that estimates aggregates in desired target units. GeoAlign is adaptive to new attributes with need for neither distribution-related domain knowledge of the attribute of interest nor knowledge of its spatial properties in Geographic Information System (GIS). We show that GeoAlign can easily be extended to perform aggregate realignment in multi-dimensional space for general use. Experiments on real, public government datasets show that GeoAlign achieves equal or better accuracy in root mean square error (RMSE) than the leading state-of-the-art approach without sacrificing scalability and robustness.
منابع مشابه
C Quintic Spline Interpolation Over Tetrahedral Partitions
We discuss the implementation of a C quintic superspline method for interpolating scattered data in IR based on a modification of Alfeld’s generalization of the Clough-Tocher scheme described by Lai and LeMéhauté [4]. The method has been implemented in MATLAB, and we test for the accuracy of reproduction on a basis of quintic polynomials. We present numerical evidences that when the partition i...
متن کاملA Local Lagrange Interpolation Method Based on C Cubic Splines on Freudenthal Partitions
A trivariate Lagrange interpolation method based on C1 cubic splines is described. The splines are defined over a special refinement of the Freudenthal partition of a cube partition. The interpolating splines are uniquely determined by data values, but no derivatives are needed. The interpolation method is local and stable, provides optimal order approximation, and has linear complexity.
متن کاملA local Lagrange interpolation method based on C1 cubic splines on Freudenthal partitions
A trivariate Lagrange interpolation method based on C cubic splines is described. The splines are defined over a special refinement of the Freudenthal partition of a cube partition. The interpolating splines are uniquely determined by data values, but no derivatives are needed. The interpolation method is local and stable, provides optimal order approximation, and has linear complexity.
متن کاملBagging Is a Small-Data-Set Phenomenon
Bagging forms a committee of classijiers by bootstrap aggregation of training sets from a pool of training data. A simple alternative to bagging is to partition the data into disjoint subsets. Experiments on various datasets show that, given the same size partitions and bags, disjoint partitions result in betterperformance than bootstrap aggregates (bags). Many applications (e.g., protein struc...
متن کاملLocal quasi-interpolation by cubic C1 splines on type-6 tetrahedral partitions
We describe an approximating scheme based on cubic C1 splines on type-6 tetrahedral partitions using data on volumetric grids. The quasi-interpolating piecewise polynomials are directly determined by setting their Bernstein–Bézier coefficients to appropriate combinations of the data values. Hence, each polynomial piece of the approximating spline is immediately available from local portions of ...
متن کامل